In this example we’ll work with the idea of flipping a coin. We’re interested in the number of heads that are flipped in n flips. We can flip a virtual coin here. If we flip this “coin” n times, we would expect the number of heads to be approximately n/2, especially as n gets large. This is straightforward enough, and with our understanding of probability we can answer this question without ever flipping a coin. However, what if we suddenly forgot all of our probability theory? Is there still a way to calculate this answer without manually flipping a coin hundreds of times?

Single coin

Randomly sample one value from the set [0, 1] where the probability of 0 is 3/4 and the probability of 1 is 1/4.

sample(c(0, 1), 1, prob = c(3/4, 1/4))
[1] 0

Simulate the process of tossing a bent coin one time, where the probability of a head is p = 1/4

sample(c(0, 1), 1, prob = c(3/4, 1/4))
[1] 0

Multiple coins

Simulate the process of tossing a bent coin 60 times, where the probability of a head is p = 1/4.

sample(c(0, 1), 60, prob = c(3/4, 1/4), replace = TRUE)
 [1] 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
[55] 0 1 0 0 0 0
sample(c(0, 1), 60, prob = c(3/4, 1/4), replace = TRUE)
 [1] 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1
[55] 0 1 0 0 0 0

Notice that the results above are different - why?

You can use set.seed() to ensure results are identical when random elements are involved.

set.seed(35487)
sample(c(0, 1), 60, prob = c(3/4, 1/4), replace = TRUE)
 [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
[55] 0 0 0 1 0 0
set.seed(35487)
sample(c(0, 1), 60, prob = c(3/4, 1/4), replace = TRUE)
 [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
[55] 0 0 0 1 0 0

Simulation study

Now that we have the building blocks, we’ll simulate this process multiple times to provide a reliable estimate of the average number of heads out of 60 tosses. There are multiple approaches that can be used here, and we’ll look at each one.

In order to build these simulation examples, let’s setup a few things first. We’ll define the number of simulations we run as n_sims and we’ll also create a function for a single “run” of the simulation.

n_sims <- 1000

flip_coins <- function(n_flips = 60) {
  sample(c(0, 1), n_flips, prob = c(3, 1), replace = TRUE)
}
flip_coins()
 [1] 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1
[55] 0 1 0 1 0 0

For Loop

For loops are a staple programming concept. However, given R’s ability to vectorize operations and other R-centric functions (the apply family), there are often better ways to iterate through objects.

heads <- numeric(n_sims)
for (i in seq_along(heads)) {
  heads[i] <- sum(flip_coins())
}

head(heads)
[1] 13 11 12 15 14 14

Now that we have the number of heads that appeared in 1000 simulations of flipping this coin 60 times, we can compute the average number of heads that appear in this scenario.

mean(heads)
[1] 14.994

Sapply function

heads <- sapply(1:n_sims, function(x) sum(flip_coins()))
head(heads)
[1] 15 10 12 13 21 13

Replicate function

replicate is part of the apply family and it lends itself well to simulation studies.

heads <- replicate(n = n_sims, sum(flip_coins()))
head(heads)
[1] 12 17 16 12 17 14

Tidyverse

The tidyverse “is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.”

The purrr package is part of the tidyverse and provides an alternative to the apply family of functions.

# install.packages("tidyverse")
library(tidyverse)

rerun(n_sims, flip_coins()) %>% 
  map_dbl(sum) %>% 
  mean()
[1] 15.154

Calculations

So far, we’ve used our simulated data to determine the average number of heads in a set of 60 unfair coin flips. We can also use this data to calculate other metrics.

Determine the variance of the number of heads in 60 coin flips.

var(heads)
[1] 11.63123

We can also find the probability that the number of heads in 60 tosses is greater than 20.

mean(heads > 20)
[1] 0.058

Finally, we can find the 25th percentile of the number of heads in 60 tosses.

quantile(heads, 0.25)
25% 
 13 
LS0tCnRpdGxlOiAiQ29pbiBGbGlwIFNpbXVsYXRpb24iCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQoY2FjaGUgPSBUUlVFLCBtZXNzYWdlID0gRkFMU0UpCnNldC5zZWVkKDM1NDg3KQpgYGAKCkluIHRoaXMgZXhhbXBsZSB3ZSdsbCB3b3JrIHdpdGggdGhlIGlkZWEgb2YgZmxpcHBpbmcgYSBjb2luLiBXZSdyZSBpbnRlcmVzdGVkIGluCnRoZSBudW1iZXIgb2YgaGVhZHMgdGhhdCBhcmUgZmxpcHBlZCBpbiBgbmAgZmxpcHMuIFdlIGNhbiBmbGlwIGEgdmlydHVhbCBjb2luCltoZXJlXShodHRwczovL2p1c3RmbGlwYWNvaW4uY29tLz9jPWJsYWNrKS4gSWYgd2UgZmxpcCB0aGlzICJjb2luIiBgbmAgdGltZXMsIHdlCndvdWxkIGV4cGVjdCB0aGUgbnVtYmVyIG9mIGhlYWRzIHRvIGJlIGFwcHJveGltYXRlbHkgYG4vMmAsIGVzcGVjaWFsbHkgYXMgYG5gCmdldHMgbGFyZ2UuIFRoaXMgaXMgc3RyYWlnaHRmb3J3YXJkIGVub3VnaCwgYW5kIHdpdGggb3VyIHVuZGVyc3RhbmRpbmcgb2YKcHJvYmFiaWxpdHkgd2UgY2FuIGFuc3dlciB0aGlzIHF1ZXN0aW9uIHdpdGhvdXQgZXZlciBmbGlwcGluZyBhIGNvaW4uIEhvd2V2ZXIsCndoYXQgaWYgd2Ugc3VkZGVubHkgZm9yZ290IGFsbCBvZiBvdXIgcHJvYmFiaWxpdHkgdGhlb3J5PyBJcyB0aGVyZSBzdGlsbCBhIHdheQp0byBjYWxjdWxhdGUgdGhpcyBhbnN3ZXIgKndpdGhvdXQqIG1hbnVhbGx5IGZsaXBwaW5nIGEgY29pbiBodW5kcmVkcyBvZiB0aW1lcz8KCiMjIyBTaW5nbGUgY29pbgpSYW5kb21seSBzYW1wbGUgb25lIHZhbHVlIGZyb20gdGhlIHNldCBbMCwgMV0gd2hlcmUgdGhlIHByb2JhYmlsaXR5IG9mIDAgaXMgMy80CmFuZCB0aGUgcHJvYmFiaWxpdHkgb2YgMSBpcyAxLzQuCgpgYGB7cn0Kc2FtcGxlKGMoMCwgMSksIDEsIHByb2IgPSBjKDMvNCwgMS80KSkKYGBgCgpTaW11bGF0ZSB0aGUgcHJvY2VzcyBvZiB0b3NzaW5nIGEgYmVudCBjb2luIG9uZSB0aW1lLCB3aGVyZSB0aGUgcHJvYmFiaWxpdHkgb2YgYQpoZWFkIGlzIHAgPSAxLzQKCmBgYHtyfQpzYW1wbGUoYygwLCAxKSwgMSwgcHJvYiA9IGMoMy80LCAxLzQpKQpgYGAKCiMjIyBNdWx0aXBsZSBjb2lucwpTaW11bGF0ZSB0aGUgcHJvY2VzcyBvZiB0b3NzaW5nIGEgYmVudCBjb2luIDYwIHRpbWVzLCB3aGVyZSB0aGUgcHJvYmFiaWxpdHkgb2YgYQpoZWFkIGlzIHAgPSAxLzQuCgpgYGB7cn0Kc2FtcGxlKGMoMCwgMSksIDYwLCBwcm9iID0gYygzLzQsIDEvNCksIHJlcGxhY2UgPSBUUlVFKQpzYW1wbGUoYygwLCAxKSwgNjAsIHByb2IgPSBjKDMvNCwgMS80KSwgcmVwbGFjZSA9IFRSVUUpCmBgYAoKTm90aWNlIHRoYXQgdGhlIHJlc3VsdHMgYWJvdmUgYXJlIGRpZmZlcmVudCAtIHdoeT8KCllvdSBjYW4gdXNlIGBzZXQuc2VlZCgpYCB0byBlbnN1cmUgcmVzdWx0cyBhcmUgaWRlbnRpY2FsIHdoZW4gcmFuZG9tIGVsZW1lbnRzIGFyZQppbnZvbHZlZC4KCmBgYHtyfQpzZXQuc2VlZCgzNTQ4NykKc2FtcGxlKGMoMCwgMSksIDYwLCBwcm9iID0gYygzLzQsIDEvNCksIHJlcGxhY2UgPSBUUlVFKQoKc2V0LnNlZWQoMzU0ODcpCnNhbXBsZShjKDAsIDEpLCA2MCwgcHJvYiA9IGMoMy80LCAxLzQpLCByZXBsYWNlID0gVFJVRSkKYGBgCgojIyMgU2ltdWxhdGlvbiBzdHVkeQpOb3cgdGhhdCB3ZSBoYXZlIHRoZSBidWlsZGluZyBibG9ja3MsIHdlJ2xsIHNpbXVsYXRlIHRoaXMgcHJvY2VzcyBtdWx0aXBsZSB0aW1lcwp0byBwcm92aWRlIGEgcmVsaWFibGUgZXN0aW1hdGUgb2YgdGhlIGF2ZXJhZ2UgbnVtYmVyIG9mIGhlYWRzIG91dCBvZiA2MCB0b3NzZXMuClRoZXJlIGFyZSBtdWx0aXBsZSBhcHByb2FjaGVzIHRoYXQgY2FuIGJlIHVzZWQgaGVyZSwgYW5kIHdlJ2xsIGxvb2sgYXQgZWFjaCBvbmUuCgpJbiBvcmRlciB0byBidWlsZCB0aGVzZSBzaW11bGF0aW9uIGV4YW1wbGVzLCBsZXQncyBzZXR1cCBhIGZldyB0aGluZ3MgZmlyc3QuIFdlJ2xsCmRlZmluZSB0aGUgbnVtYmVyIG9mIHNpbXVsYXRpb25zIHdlIHJ1biBhcyBgbl9zaW1zYCBhbmQgd2UnbGwgYWxzbyBjcmVhdGUgYSAKZnVuY3Rpb24gZm9yIGEgc2luZ2xlICJydW4iIG9mIHRoZSBzaW11bGF0aW9uLgoKYGBge3J9Cm5fc2ltcyA8LSAxMDAwCgpmbGlwX2NvaW5zIDwtIGZ1bmN0aW9uKG5fZmxpcHMgPSA2MCkgewogIHNhbXBsZShjKDAsIDEpLCBuX2ZsaXBzLCBwcm9iID0gYygzLCAxKSwgcmVwbGFjZSA9IFRSVUUpCn0KYGBgCgpgYGB7cn0KZmxpcF9jb2lucygpCmBgYAoKCiMjIyMgRm9yIExvb3AKRm9yIGxvb3BzIGFyZSBhIHN0YXBsZSBwcm9ncmFtbWluZyBjb25jZXB0LiBIb3dldmVyLCBnaXZlbiBSJ3MgYWJpbGl0eSB0byB2ZWN0b3JpemUKb3BlcmF0aW9ucyBhbmQgb3RoZXIgUi1jZW50cmljIGZ1bmN0aW9ucyAodGhlIGBhcHBseWAgZmFtaWx5KSwgdGhlcmUgYXJlIG9mdGVuCmJldHRlciB3YXlzIHRvIGl0ZXJhdGUgdGhyb3VnaCBvYmplY3RzLgoKYGBge3J9CmhlYWRzIDwtIG51bWVyaWMobl9zaW1zKQpmb3IgKGkgaW4gc2VxX2Fsb25nKGhlYWRzKSkgewogIGhlYWRzW2ldIDwtIHN1bShmbGlwX2NvaW5zKCkpCn0KCmhlYWQoaGVhZHMpCmBgYAoKTm93IHRoYXQgd2UgaGF2ZSB0aGUgbnVtYmVyIG9mIGhlYWRzIHRoYXQgYXBwZWFyZWQgaW4gMTAwMCBzaW11bGF0aW9ucyBvZiBmbGlwcGluZwp0aGlzIGNvaW4gNjAgdGltZXMsIHdlIGNhbiBjb21wdXRlIHRoZSBhdmVyYWdlIG51bWJlciBvZiBoZWFkcyB0aGF0IGFwcGVhciBpbiB0aGlzCnNjZW5hcmlvLgoKYGBge3J9Cm1lYW4oaGVhZHMpCmBgYAoKIyMjIyBTYXBwbHkgZnVuY3Rpb24KCmBgYHtyfQpoZWFkcyA8LSBzYXBwbHkoMTpuX3NpbXMsIGZ1bmN0aW9uKHgpIHN1bShmbGlwX2NvaW5zKCkpKQpoZWFkKGhlYWRzKQpgYGAKCiMjIyMgUmVwbGljYXRlIGZ1bmN0aW9uCmByZXBsaWNhdGVgIGlzIHBhcnQgb2YgdGhlIGBhcHBseWAgZmFtaWx5IGFuZCBpdCBsZW5kcyBpdHNlbGYgd2VsbCB0byBzaW11bGF0aW9uCnN0dWRpZXMuCgpgYGB7cn0KaGVhZHMgPC0gcmVwbGljYXRlKG4gPSBuX3NpbXMsIHN1bShmbGlwX2NvaW5zKCkpKQpoZWFkKGhlYWRzKQpgYGAKCiMjIyMgVGlkeXZlcnNlClRoZSBbdGlkeXZlcnNlXShodHRwczovL3d3dy50aWR5dmVyc2Uub3JnLykgImlzIGFuIG9waW5pb25hdGVkIGNvbGxlY3Rpb24gb2YgUiAKcGFja2FnZXMgZGVzaWduZWQgZm9yIGRhdGEgc2NpZW5jZS4gQWxsIHBhY2thZ2VzIHNoYXJlIGFuIHVuZGVybHlpbmcgZGVzaWduIApwaGlsb3NvcGh5LCBncmFtbWFyLCBhbmQgZGF0YSBzdHJ1Y3R1cmVzLiIKClRoZSBgcHVycnJgIHBhY2thZ2UgaXMgcGFydCBvZiB0aGUgYHRpZHl2ZXJzZWAgYW5kIHByb3ZpZGVzIGFuIGFsdGVybmF0aXZlIHRvIAp0aGUgYGFwcGx5YCBmYW1pbHkgb2YgZnVuY3Rpb25zLgoKYGBge3J9CiMgaW5zdGFsbC5wYWNrYWdlcygidGlkeXZlcnNlIikKbGlicmFyeSh0aWR5dmVyc2UpCgpyZXJ1bihuX3NpbXMsIGZsaXBfY29pbnMoKSkgJT4lIAogIG1hcF9kYmwoc3VtKSAlPiUgCiAgbWVhbigpCmBgYAoKCiMjIyBDYWxjdWxhdGlvbnMKU28gZmFyLCB3ZSd2ZSB1c2VkIG91ciBzaW11bGF0ZWQgZGF0YSB0byBkZXRlcm1pbmUgdGhlIGF2ZXJhZ2UgbnVtYmVyIG9mIGhlYWRzCmluIGEgc2V0IG9mIDYwIHVuZmFpciBjb2luIGZsaXBzLiBXZSBjYW4gYWxzbyB1c2UgdGhpcyBkYXRhIHRvIGNhbGN1bGF0ZSBvdGhlcgptZXRyaWNzLgoKRGV0ZXJtaW5lIHRoZSB2YXJpYW5jZSBvZiB0aGUgbnVtYmVyIG9mIGhlYWRzIGluIDYwIGNvaW4gZmxpcHMuCgpgYGB7cn0KdmFyKGhlYWRzKQpgYGAKCldlIGNhbiBhbHNvIGZpbmQgdGhlIHByb2JhYmlsaXR5IHRoYXQgdGhlIG51bWJlciBvZiBoZWFkcyBpbiA2MCB0b3NzZXMgaXMgZ3JlYXRlcgp0aGFuIDIwLgpgYGB7cn0KbWVhbihoZWFkcyA+IDIwKQpgYGAKCkZpbmFsbHksIHdlIGNhbiBmaW5kIHRoZSAyNXRoIHBlcmNlbnRpbGUgb2YgdGhlIG51bWJlciBvZiBoZWFkcyBpbiA2MCB0b3NzZXMuCmBgYHtyfQpxdWFudGlsZShoZWFkcywgMC4yNSkKYGBgCg==